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Abstract 


Econometric  applications  of  kernel  estimators  are  proliferating,  suggesting  the  need  for 
convenient  variance  estimates  and  conditions  for  asymptotic  normality.     This  paper  develops 
a  general  "delta  method"  variance  estimator  for  functionals  of  kernel  estimators.     Also, 
regularity  conditions  for  asymptotic  normality  are  given,  along  with  a  guide  to  verifying 
them  for  particular  estimators.     The  general  results  are  applied  to  partial  means,  which  are 
averages  of  kernel  estimators  over  some  of  their  arguments  with  other  arguments  held  fixed. 
Partial  means  have  econometric  applications,  such  as  consumer  surplus  estimation,  and  are 
useful  for  estimation  of  additive  nonparametric  models. 

Keywords:     Kernel  estimation,  partial  means,  standard  errors,  delta  method,  functional 
estimation. 


1.        Introduction 

There  are  a  growing  number  of  applications  where  estimators  use  the  kernel  method  in 
their  construction,  i.e.  where  functionals  of  kernel  estimators  are  involved.     Examples 
include  average  derivative  estimation  (Hardle  and  Stoker,  1989,  and  Powell,  Stock,  and 
Stoker,  1989),  nonparametric  policy  analysis  (Stock,   1989),  consumer  surplus  estimation 
(Hausman  and  Newey,  1992),   and  others  that  are  the  topic  of  current  research.     An 
important  example  in  this  paper  is  a  partial  mean,  which  is  an  average  of  a  kernel 
regression  estimator  over  some  components,  holding  others  fixed.     The  growth  of  kernel 
applications  suggests  the  need  for  a  general  variance  estimator,  that  applies  to  many 
cases,  including  partial  means.     This  paper  presents  one  such  estimator.     Also,  the  paper 
gives  general  results  on  asymptotic  normality  of  functionals  of  kernel  estimators. 

Partial  means  control  for  covariates  by  averaging  over  them.     They  are  related  to 
additive  nonparametric  models  and  have  important  uses  in  economics,  as  further  discussed 
below.     It  is  shown  here  that  their  convergence  rate  is  determined  by  the  number  of 
components  that  are  averaged  out,  being  faster  the  more  components  that  are  averaged 
over. 

The  variance  estimator  is  based  on  differentiating  the  functional  with  respect  to  the 
contribution  of  each  observation  to  the  kernel.     A  more  common  method  is  to  calculate  the 
asymptotic  variance  formula  and  then  "plug-in"  consistent  estimators.     This  method  can  be 
quite  difficult  when  the  asymptotic  formula  is  complicated,  as  often  seems  to  be  the 
case.     In  contrast,  the  approach  described  here  only  requires  knowing  the  form  of  the 
functional  and  kernel.     Also,   it  gives  consistent  standard  errors  even  for  fixed 
bandwidths  (when  the  estimator  is  centered  at  its  limit),  unlike  the  more  common 
approach.     In  this  way  it  is  like  the  Huber  (1967)  asymptotic  variance  for  m-estimators. 
Also,   it  is  a  generalization  of  the  "delta  method"  for  functions  of  sample  means. 

An  alternative  approach  to  variance  estimation,   or  confidence  intervals,   is  the 
bootstrap.     The  bootstrap  may  give  consistent  confidence  intervals  (e.g.   by  the 
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percentile  method)  for  the  same  types  of  functionals  considered  here,  although  this  does 
not  appear  to  be  known.     In  any  case,  variance  estimates  are  useful  for  bootstrap 
improvements  to  the  asymptotic  distribution,  as  considered  in  Hall  (1992). 

The  variance  formula  given  here  has  antecedents  in  the  literature.     For  a  kernel 
density  at  a  point  it  is  equal  to  the  sample  variance  of  the  kernel  observations,  as 
recently  considered  by  Hall  (1992).     For  a  kernel  regression  at  a  point,   a  related 
estimator  was  proposed  by  Bierens  (1987).     Also,  the  standard  errors  for  average 
derivatives  in  Hardle  and  Stoker  (1989)  and  Powell,  Stock,   and  Stoker  (1989)  are  equal  to 
this  estimator  when  the  kernel  is  symmetric.     New  cases  included  here  are  partial  means 
and  estimators  that  depend  (possibly)  nonlinearly  on  all  of  the  density  or  regression 
function,  and  not  just  on  its  value  at  sample  points. 

Section  2  sets  up  m-estimators  that  depend  on  kernel  densities  or  regressions,   and 
gives  examples.     Section  3  gives  the  standard  errors,  i.e.  the  asymptotic  variance 
estimator.     Section  4  describes  partial  means  and  their  estimators,  and  associated 
asymptotic  theory.     Section  5  gives  some  general  lemmas  that  are  useful  for  the 
asymptotic  theory  of  partial  means,  and  more  generally  for  other  nonlinear  functionals  of 
kernel  estimators.     The  proofs  are  collected  in  Appendix  A,   and  Appendix  B  contains  some 
technical  lemmas. 


2.        The  Estimators 

The  estimators  considered  in  this  paper  are  two  step  estimators  where  the  first  step 
is  a  vector  of  kernel  estimators.     To  describe  the  first  step,   let     y     be  a     r  x  1 
vector  of  variables,     x     a     k  x  1     vector  of  continuously  distributed  variables,   and 
denote  the  product  of  the  density     fn(x)     of     x     with     E[y|x]     as 
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(2.1)  hQ(x)  =  E[y|x]f0(x). 


Let     X(u)     denote  a  kernel  function  satisfying    TX(u)du  =  1     and  other  conditions  given 

in  Section  4,  where     u     is     k  x  1.     Let     z.,     (i  =  1 n),     denote  data  observations, 

that  include  observations     y.     and     x.     on     y     and     x.     Then  for  a  bandwidth     o-  >  0     and 

l  l 

-k 
K  (u)  =  a-     K(u/o-),     a  kernel  estimator  of     h_.     is 
o"  u 


(2.2)  h(x)  =  n  ^"y.K  (x-x.). 


This  estimator  is  the  first  step  considered  here. 

A  second  step  allowed  for  in  this  paper  is  an  m-estimator  that  depends  on  the 
estimated  function     h.     To  describe  such  an  estimator,  let     |3     denote  a  vector  of 
parameters,  with  true  value     ($  ,     and     m(z,/3,h)     a  vector  of  functions  that  depend  on  the 
observation,  parameter,  and  the  function     h.     Here     m(z,j3,h)     is  allowed  to  depend  on  the 
entire  function     h,     and  not  just  its  value  at  observed  points;   see  below  for  examples. 
Suppose  that     E[m(z,£   ,h   )]  =  0.     A  second  step  estimator     |3     that  solves  a  corresponding 
sample  equation  is 

(2.3)  n"1£.nim(z.,/3,h)  =  0. 

*n=l       i 

This  is  a  two-step  m-estimator  where  the  first  step  is  the  kernel  estimator  described 
above. 

This  estimator  includes  as  special  cases  functions  of  kernel  estimators  evaluated 
at  points,  e.g.   a  kernel  density  estimator  at  a  point.     Some  other  interesting  examples 
are  as  follows: 

Partial  Means:  An  example  that  is  (apparently)  new  is  an  average  of  a  nonparametric 
regression  over  some  variables  holding  others  fixed.  Let  q  denote  a  random  variable 
and     gn(x)  =  E[q|x].     Partition     x  =  (x  ,x  )     and  let     x„     be  a  variable  that  is  included 
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in     z     and  has  the  same  dimension  as     x„,     and     x      be  some  fixed  value  for  x  .     Let 

&  L  J. 

x(x  )     be  some  weight  function,  possibly  associated  with  fixed  "trimming"  that  keeps  a 
denominator  bounded  away  from  zero.     A  partial  mean  is 


(2.3a)  /3Q  =  E[T(x2)g0(xrx2)]. 


This  object  is  an  average  over  some  conditioning  variables  holding  others  fixed.     It  can 

be  estimated  by  substituting  a  kernel  estimator  for     g       and  a  sample  average  for  the 

expectation.     Let     y  =  (l,q),     g(x)  =  h  (x)/h  (x)  =  h  (x)/f(x),     for  the  kernel  density 

estimator     f(x)  =  h,(x),     and     x.  =  (x,,x„.).     Then  the  estimator  is 
1  1  1    2i 


(2.4)  |3  =  n  ^TlXylilx.). 


This  estimator  is  a  special  case  of  equation  (2.3)  with     m(z,0,h)  = 

t(x  )h  (x  ,x„)/h  (x  ,x„)  -  /3.     It  shows  how  explicit  estimators  can  be  included  as 

special  cases  of  equation  (2.3).     Further  discussion  is  given  in  Section  4. 

Differential  Equation  Solution:     An  estimator  with  economic  applications  is  one  that 
solves  a  differential  equation  depending  on  a  nonparametric  regression.     To  describe  this 
estimator,   let     y  =  (l,q)     and  suppose     x     is  two-dimensional  (i.e.     k  =  2),  with     x  = 
(x  ,x  )'.     Let     x      be  some  fixed  value  for     x      and  consider  two  possible  values  for 
x  ,     denoted  by     p     and     p  ,     with     p     <  p  .     The  estimator  is  given  by 

(2.5)  0  =  S(p°),     dS(p)/dp  =  -g(x  -S(p),p),     Sip1)  =  0, 


for     g(x)  =  h„(x)/f(x).     It  is  a  special  case  where  the     m(z,£,h)     of  equation  (2.3)  is 
the  solution  of  the  differential  equation  minus     £.     This  example  shows  one  way  that 
m(z,/3,h)     can  be  allowed  to  depend  on  the  entire  function     h.     The  economic 
interpretation  of     £     is  a  nonparametric  estimate  of  the  cost  of  a  change  of  price     p, 
of  a  commodity     q,     from     p      to     p  ,     for  an  individual  with  income     x      and  demand 
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function     gn(x)  =  E[q|x].     This  example  is  analyzed  in  Hausman  and  Newey  (1992),  using 
results  developed  here. 

Inverse  Density  Weighted  Least  Squares:     An  estimator  that  is  useful  for  estimating 
the  semiparametric  generalized  regression  model     E[y|x]  =  t(x'5),     where     t(«)     is  an 
unknown  transformation,  is  a  weighted  least  squares  estimator,  that  can  be  described  as 
follows.     Let     t(x)     be  a  density  of  an  elliptically  symmetric  distribution  (i.e.     t(x) 
is  a  density  that  depends  only  on     (x-fi)'E(x-fi)     for  some     /j     and     Z),     that  has  bounded 
support.     The  estimator  solves 

(2.6)  0     minimizes     Y.n,f(.x.)~lT(x.)lq.-x'.p]2. 

^i=l      l  ill 

This  estimator  has  the  form  given  in  equation  (2.3),  with     m(z,/3,h)  =  h(x)    T(x)x[q-x'£]. 
The  weighting  by  the  inverse  density  leads  to     |3     converging  to  the  least  squares 
projection  of     E[q|x]     on     x     under  the  density     t(x),     which  is  consistent  for  scaled 
coefficients  of  a  generalized  regression  model,  as  discussed  in  Ruud  (1986)  and  Li  and 
Duan  (1991).     This  estimator  is  analyzed  in  Newey  and  Ruud  (1991),  using  results 
developed  here. 

The  results  of  this  paper  apply  to  each  of  these  examples,  as  discussed  below.     They  will 
also  apply  to  other  estimators,  including  those  that  minimize  a  quadratic  form  in  a 
sample  average  depending  on     £     and     h,     or  minimize  a  sample  average,  such  as 
quasi-maximum  likelihood  estimators  that  depend  on  kernel  estimators. 


3.        The  Asymptotic  Variance  Estimator 

To  form  approximate  confidence  intervals  and  test  statistics  it  is  important  to  have 
consistent  standard  errors.     To  motivate  the  form  of  the  asymptotic  variance  estimator, 
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it  is  helpful  to  briefly  sketch  the  asymptotic  distribution  theory.     Expanding  the 
left-hand  side  of  equation  (2.3)  around     0       and  solving  for     0  -  0       gives 

(3.1)  0  -  0Q  =  -[n"1j;."iam(z.J,n)/a0]"1mnOo)>     mn(|S)  =  I.^mfz.^.hJ/n, 


where     0     is  a  mean  value.     By  the  uniform  law  of  large  numbers  discussed  in  Section  4, 
n    Y,_  19m(z.,£,h)/3/3     will  converge  in  probability  to 

M  =  E[am(z,30,h   )/80]. 


so  that  the  asymptotic  distribution  of     0     will  be  determined  by     m  (0n).     In  Section  4 
conditions  will  be  given  for  existence  of     a  a  0     such  that 

(3.1a)  vno-°m  (0_)  -±>  N(0,V). 

n    0 


The  magnitude  of     a     will  be  determined  by  the  form  of     E[m(z,0   ,h)]     as  a  function  of 
h,     with     a     being  smaller  the  more  dimensions  being  integrated  over  in     E[m(z,0   ,h)]. 
By  the  Slutzky  theorem  the  asymptotic  distribution  for     0     will  be 

(3.1b)  vn<ra(0  -  0   )  -U  N(0,M_1VM-1, ). 


A  consistent  asymptotic  variance  estimator  can  be  constructed  by  substituting 
estimates  for  true  values  in  the  formula     M    VM     ' .     It  is  easy  to  construct  an  estimator 
of     M,     as 

(3.1c)  M  =  n"1X;."iam(z.,0,n)/a0. 


Finding  a  consistent  estimator  of     V     is  more  difficult,   because  of  the  need  to  account 
for  the  presence  of     h     in     m   (0n)-      One  common  approach  to  this  problem  is  to  calculate 
the  asymptotic  variance,   and  then  form  an  estimator  by  substituting  estimates  for  unknown 
functions,  such  as  sample  averages  for  expectations.     This  approach  can  be  difficult  when 
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the  asymptotic  variance  is  very  complicated.     Also,  it  may  be  sensitive  to  the  bandwidth 

parameter,  because  the  variance  formula  is  only  valid  in  the  limit  as     tr  — >  0. 

The  asymptotic  variance  estimator  here  is  constructed  by  estimating  the  influence  of 

each  observation  in     h     on     T.   ,m(z.,S,h)/n.     Let     C     denote  a  scalar  number  and  let 

^i=l       1 

(3.2)  5.  =  fl[n"1tj;iin(zJ,|,fi  +  O^'  ~  x.))]/aci 

th 
The  interpretation  of     5.     is  that  it  estimates  the  first-order  effect  of  the  i 

observation  in     h     on     £.     m(z.,/3   ,h)/n.     In  this  sense  it  is  an  "influence  function" 

estimator.     The  variance  can  be  estimated  by  including  this  term  with     m(z.,P,h)     in  a 

sample  variance,  as  in 

(3.3)  V  =  £."#.#'. /n,     #.  =  m(z.,j§,h)  +  8.  -  £.n.5yn. 

n=li   l  l  l  l         J=l  J 

An  asymptotic  variance  estimator  for     /3     can  then  be  constructed  by  combining     V     with  a 
Jacobian  estimator  in  the  usual  way,   as  in 

(3.4)  Var(p)  =  M_1VM_1' ,     M  =  n-1£."  Bmlz.,(i&)/ap. 

In  Section  5  conditions  will  be  given  that  are  sufficient  for     cr     var(£)  — ^->  M    VM     ' . 
Consequently,   inference  procedures  based  on     /3  -  £       being  normally  distributed  with  mean 

0     and  variance     Var(/3)/n     will  be  asymptotically  valid.     For  example,     $ .  ± 

-     ~  1/2 

1.96[Var(|3)../n]  will  be  an  asymptotic  95  percent  confidence  interval.     It  is 

interesting  to  note  that  the  form  of     Var(p)     does  not  depend  on  the  convergence  rate  for 

|3     (i.e.   on     a),     but  that  its  large  sample  behavior  will. 

This  asymptotic  variance  estimator  accounts  for  the  presence  of     h     by  including  the 

terms     5.     in     0..     These  terms  are  straightforward  to  compute,  requiring  only  knowledge 

of  the  form  of     m(z,/3,h)     and  the  kernel.     In  particular,     5.     can  be  calculated  by 

analytic  differentiation  with  respect  to  the  scalar     C,.     Alternatively,   if  the  analytic 

formula  is  very  hard  to  construct,     5.     can  be  calculated  as  the  numerical  derivative  of 
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y.n,m(z.J,h  +  <y.K  (•   -  x.))/n     with  respect  to     <. 
*-j=l       j  1  cr  1 

Here     V  =  Y.   ,0.fi'./n     is  a  "delta-method"  variance  for  kernel  estimators.     It  is 

Li=lri  1 

exactly  analogous  to  delta  method  variances  for  parametric  estimators.     For  example,   if 

h     was  a  sample  mean  rather  than  a  kernel  estimator,  say     h  =  Y>.     y./n,     then  the  analog 

of     5.     is     a{y.n,m(z.,£,h  +  C,y.)/n}/dC,  =  in,  y.,     where     in,    =  n~1y'.niSa(z.,P,h)/5h.     Thus, 
i  j=l       J  i  hi  h  ^j=l        j 

the  analogous  influence  function  estimator  would  be     !fi.  =  m(z.,£,h)  +  m.y.  -  (£.    m,y./n) 
=  a(z.,0,h)  +  m,  (y.-y),     the  usual  delta-method  formula  for  the  presence  of  a  sample 
average  in  an  m-estimator.        Another  feature  of     Var(/3)     is  that  it  does  not  rely  on  the 
bandwidth  shrinking  to  zero  for  its  validity.     If  the  bandwidth  were  held  fixed  it  would 
be  a  consistent  estimator  of  the  asymptotic  variance  of     Vn(fi  -  (3   ),     where     |3       is  the 
limit  of     |3     when  the  bandwidth  is  held  fixed  at     <r. 

The  terms     m(z.,£,h)     and     £.    S./n     are  asymptotically  negligible  in     \j>.     when  the 
convergence  rate  of     £     is  slower  than     1/Vn.     They  are  retained  because  they  are  easy  to 
compute  and  could  conceivably  improve  the  asymptotic  approximation.     Also,  for  analogous 
reasons  the  formula  for     5 .     does  not  distinguish  between  elements  of     h     that  affect  the 
asymptotic  distribution  and  those  that  do  not  (e.g.  between  pointwise  density  levels  and 
derivatives,  where  the  slower  convergence  rate  of  the  derivative  will  dominate). 

Some  examples  may  serve  to  illustrate  the  form  of  this  estimator.     The  simplest 

example  is  a  density  estimator     |3  =  f(x)     at  some     x,     where  the  asymptotic  variance 

estimator  is     Var(p)  =  Y.   ,K  (x-x.)  /n  -  [Y .  ,K  (x-x.)/n]   ,     the  sample  variance  of 

^i=l   <r         i  ^j=l   cr         j 

K   (x-x.).     This  estimator  was  recently  considered  by  Hall  (1992).     Other  examples  are: 


Partial  Means:     Here     5.     can  be  obtained  by  explicit  differentiation  of 

n"1XniT(x_.)[h0(x.)  +  <q-K  (x  -x.)]/[f(x.)  +  <K  (x.-x.)],     as 
^j=l       2j      2     j         ^Mi   o-     j     i  j  <r     j     l 


5.  =  n  tV^.lftx.)  X[q.  -  f(x.)  ^(xJK  (x.-x.). 
l  ^j=l       2j         j        Mi  j        2     j      cr     j     i 


The  asymptotic  variance  estimator  can  then  be  formed  as 
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(3.5)  Var(£)  =  E^i^/n,     Ifi.  =  x(x_.)h_(x.)/f(x.)  -  0  +  5.  -  J.^Syn. 

^1=1    l  l  2i     2     l  l  l         J=l  J 

Differential  Equation  Solution:     It  is  possible  to  derive  an  analytical  expression  for 
5.,     but  this  expression  is  quite  complicated  and  difficult  to  evaluate.     An  alternative 
approach  that  is  used  by  Hausman  and  Newey  (1992),   is  to  numerically  differentiate  the 
numerical  solution  to     dS(p)/dp  =  -[h  (p,x  -S(p))  +  <q.K  ((p,x  -S(p))-x.)]/[f(p,x  -S(p)) 
+  £K  ((p,x  -S(p))-x.)]     with  respect  to     £     to  form     5..     This  approach  is  quite  feasible 
using  existing  fast  and  accurate  numerical  algorithms  for  ordinary  differential  equations. 


Inverse  Density  Weighted  Least  Squares:     Let     u.  =  y.-x'.p.     The  variance  estimator  is 

(3.5a)  Var(0)  =  fiT^ifV.11-^'.  jfif"1,     M  =  n"V.nif(x.)"1T(x.)x.x'. 

*-i=l  l    i  ^i=l      l  ill 

#.  =  t(x.)x.u.  +  5.  -  5\n,8.,     8-  =  -n"1yniT(x.)f(x.)"2x.u.K  (x.-x.). 
i  ill         i       ^J=l  J         i  ^J=l       J        J         J  J   c    J     i 


For  partial  means,   asymptotic  normality  and  consistency  of  the  variance  estimator  are 
shown  in  Section  4,  using  the  Lemmas  of  Section  5.     Corresponding  theoretical  results  for 
the  other  examples  are  given  elsewhere. 


4.        Partial  Means 

Partial  means  have  a  number  of  applications  in  economics.     For  example,   they  can  be 
used  to  approximate  the  solution  to  the  differential  equation  described  in  Section  2. 
Dropping  the     S(p)     term  from  inside     g  (x  -S(p),p)     leads  to  an  approximation  as 
P„  =         g   (x  ,p)dp  =  E[(p  -p  )g   (x  ,x  )],     where     x       is  distributed  uniformly  on 

[p  ,p  ].     It  is  known  that  this  approximation  is  quite  good  in  many  economic  examples, 
where     S(p)     is  a  small  proportion  of     x     (see  Willig,   1978).     This  is  a  partial  mean  as 
described  in  Section  2.     It  can  be  estimated  by 


-  9  - 


(4.1)  0  =  (p1-p°)Ii"1g(x1,p.)/n, 

where     p.     is  drawn  from  a  uniform  distribution  on     [p  ,p  ].     This  is  a  simulation 
estimator  similar  in  spirit  to  that  of  Lerman  and  Manski  (1981). 

Partial  means  are  also  of  interest  from  a  purely  statistical  point  of  view,  as 
dimension  attenuation  devices.     Like     E[q|x  ],     the  partial  mean  is  a  function  of  a 
smaller  dimensional  argument.     Consequently,  partial  mean  estimators  will  converge  faster 
than  estimators  of     gn(x).     However,  unlike     E[q|x  ],     a  partial  mean  controls  for  the 
covariates     x„,     in  an  average  way. 

The  way  in  which  partial  means  control  for  covariates  is  illustrated  by  their 
relationship  to  additive  nonparametric  models.     Suppose  that  the  conditional  expectation 
takes  an  additive  form,     E[q|x]  =  g     (x  )  +  g     (x  ),     and  that     E[x(x„)]  =  1.     Then 


(4 


2)  E[T(x2.)g0(Xl,x2.)]  =  g10(Xl)  +  E[T(x2.)g20(x2.)]. 


Thus,   as  a  function  of     x  ,     the  partial  mean  estimates  the  corresponding  component  of  an 
additive  model,  up  to  a  constant. 

In  comparison  with  other  additive  model  estimators,  partial  means  are  easier  to 
compute  but  may  be  less  asymptotically  efficient.     Unlike  alternating  conditional 
expectation  estimator  for  additive  models  (ACE,  Breiman  and  Friedman,   1985),  the  partial 
mean  is  an  explicit  functional,  so  the  kernel  estimator  will  not  require  iteration. 
However,   because  the  partial  mean  does  not  impose  additivity  it  may  be  a  less  efficient 
estimator.     Also,  the  partial  mean  depends  on  the  full  conditional  expectation,  so  the 
curse  of  dimensionality  may  result  in  slower  convergence  to  the  limiting  distribution. 

The  partial  mean  and  ACE  are  different  statistical  objects  when  no  restrictions  are 
placed  on  E[q|x].  The  partial  mean  is  given  in  equation  (2.3a).  The  ACE  object  is  the 
mean-square  projection  of  E[q|x]  on  the  set  of  functions  of  the  form  gJx  )  +  g„(x  ). 
These  estimators  summarize  different  features  of     E[q|x].     If  one  is  interested  in 
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of     5.     is     8<£."  m(z.,0,fi  +  ZyJ/xiY/dC,  =  nyr.,     where     mh  =  n  *£  "  3a(z.,0,h)/ah.     Thus, 

the  analogous  influence  function  estimator  would  be     iji.  =  m(z.,/3,h)  +  m,y.   -  (.Y .  ,m,y./n) 
6  1  1  tr  1         ^j=l    h  j 

=  a(z.,/3,h)  +  m,  (y.-y),     the  usual  delta-method  formula  for  the  presence  of  a  sample 

average  in  an  m-estimator.        Another  feature  of     Var(/3)     is  that  it  does  not  rely  on  the 

bandwidth  shrinking  to  zero  for  its  validity.     If  the  bandwidth  were  held  fixed  it  would 

be  a  consistent  estimator  of  the  asymptotic  variance  of     Vn{@  -  |3   ),     where     £       is  the 

cr  cr 

limit  of     p     when  the  bandwidth  is  held  fixed  at     o\ 

The  terms     m(z.,p,h)     and     Y,.    8./n     are  asymptotically  negligible  in     $.     when  the 
convergence  rate  of     j§     is  slower  than     1/Vn.     They  are  retained  because  they  are  easy  to 
compute  and  could  conceivably  improve  the  asymptotic  approximation.     Also,  for  analogous 
reasons  the  formula  for     5.     does  not  distinguish  between  elements  of     h     that  affect  the 
asymptotic  distribution  and  those  that  do  not  (e.g.   between  pointwise  density  levels  and 
derivatives,  where  the  slower  convergence  rate  of  the  derivative  will  dominate). 

Some  examples  may  serve  to  illustrate  the  form  of  this  estimator.     The  simplest 

example  is  a  density  estimator     £  =  f(x)     at  some     x,     where  the  asymptotic  variance 

estimator  is     Var(p)  =  Y.   ,K  (x-x.)  /n  -  [Y .  ,K  (x-x.)/n]   ,     the  sample  variance  of 

^i=l  <r         l  ^j=l   cr         j 

K  (x-x.).     This  estimator  was  recently  considered  by  Hall  [3].     Other  examples  are: 


Partial  Means:     Here     5.     can  be  obtained  by  explicit  differentiation  of 

n'tStL.UhJx.)  +  Cq.K  (x.-x.)]/[f(x.)  +  <K  (x.-x.)],     as 
^j=l       2j      2     j         ^Mi   cr     j     l  j  cr     j     i 

5.  =  n"1I.n1T(3L.)f(x.r1[q.  -  f(x.)_1h0(x.)]K  (x.-x.). 
l  ^j=l       2j         j        Mi  j        2     j      cr     j     l 


The  asymptotic  variance  estimator  can  then  be  formed  as 


Var(£)  =  £.n.02/n,     $.  =  T(x_.)h„(x.)/f(x.)  -  p  +  8.  -  I.n,5./n. 
^1=1    l  l  2i     2     l  l  l       ^j=l  j 


(10) 


Differential  Equation  Solution:     It  is  possible  to  derive  an  analytical  expression  for 
5.,     but  this  expression  is  quite  complicated  and  difficult  to  evaluate.     An  alternative 
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normal.     Let  the     u     argument  of     X(u)     be  partitioned  conformably  with     x     and    fn(x9) 
denote  the  true  density  of     x„.     The  asymptotic  variance  of  the  partial  mean  estimator 
will  be 

(4.3)  V  =  [X{XX(u1,u2)du2>2du1].J>f0(x1,t)"1T(t)2f0(t)2Var(q|x=(x1,t))dt. 

4  4 

Theorem.  4.1:     Suppose  that     i)     E[\q\    ]  <  m,     E[\q\    \x]fJx)     and     fQ(x)     are  bounded; 

ii)  Assumptions  H  and  K  are  satisfied  for     d  s  s;  Hi)  x(x  )     is  bounded  and  zero  except 

on  a  compact  set  where     fJx,x)     is  bounded  away  from  zero;     iv)     x(x  )     and     f  (x  ) 

~  2 

are  continuous  a.e.,     frfx2)     is  bounded,     E[q\x]     and     E[q   \x]     are  continuous,  and 

4  Pk-k  9 

for  some     c  >  0,     fsup       ^{1+Eiq    \x=(xfr\,x2)]}f(xfx\,x2)dx2  <  «;     v)  n<r         i/ln(n)     -> 

co     and     no-  i+*S  — >  0.     Then  for     $     in  equation  (2,4),     Vntrki/Z(fi  -  PQ)  -^  N(0,V). 

If,  in  addition,     ncr  l  — >  oo,     then     cr  lV  — ^  V. 


The  conditions  here  embody  "undersmoothing,"  meaning  that  the  bias  goes  to  zero  faster 
than  the  variance.     Undersmoothing  is  reflected  in  the  conclusion,  where  the  limiting 
distribution  is  centered  at  zero,  rather  than  at  a  bias  term. 

An  improved  convergence  rate  for  partial  means  over  pointwise  estimators  is  embodied 

k  /2 
in  the  normalizing  factor     v'ntT  i         for  the  asymptotic  distribution.     The  rate  implied  by 

k   -1/2 
the  asymptotic  distribution  result  is     (no-  l)  while  the  corresponding  rate  from  the 

k  -1/2 

usual  asymptotic  normality  result  for  pointwise  estimators  is     (n<r  )        ,     which 

converges  to  zero  slower  by     cr     going  to  zero.     Furthermore,  the  rate  for  partial  means 
is  exactly  the  nonparametric  pointwise  rate  when  the  dimension  is     k  .     Thus,  the  more 
components  are  averaged  out,   the  smaller  will  be     k  ,     and  hence  the  faster  will  be  the 
convergence  rate. 

One  important  feature  of  this  result  is  hypothesis  iii),  that  amounts  to  a  "fixed 
trimming"  condition,  where  the  density  of     x     is  bounded  away  from  zero  where     t     is 
nonzero.     This  condition  is  theoretically  convenient  because  it  avoids  the  "denominator 
problem."     It  is  used  here  because  it  is  not  restrictive  in  many  cases  (e.g.   pointwise 
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estimation)  and  because  the  resulting  theory  roughly  corresponds  to  trimming  based  on  a 
large  auxiliary  sample,  which  is  often  available.     It  might  be  possible  to  modify  the 
results  to  allow  trimming  to  depend  on  the  sample  size,  e.g.   as  in  Robinson  (1988),  but 
this  modification  would  be  very  complicated. 

Estimators  will  be  v'n-consistent  when  they  are  full  means,   i.e.   are  averages  over 
all  components.     There  are  many  interesting  examples  of  such  estimators,   such  as  the 
policy  analysis  estimator  of  Stock  (1989).     The  general  conditions  given  here  are 
slightly  different  for  that  case,  so  it  is  helpful  to  describe  the  estimator  and  result 
in  a  slightly  different  way.     Suppose     j3     =  E[a  (z)g  (x)],     where     g0(x)  =  E[q|x],     an(z) 
is  some  function  of  the  data,  and     x     is  a  continuously  distributed  variable  that  may  be 
different  than     x.     A  kernel  estimator  of     (3   ,     along  with  the  associated  asymptotic 
variance  estimator  from  equation  (2.6),  for     g(x)  =  h  (x)/h  (x)     as  above,  is 

(4.4)  p  =  Ei"1a0(zi)i(x.)/n,     Var(p)  =  I^/n,     0.  =  a0(z.)g(x.)  -  0  +  6.  -  1^,/n, 

5.  =  E.V^zJftx.rHq.-gtxJlK  (x.-x.)/n. 
l        ^j=l  Oj         j        Mi  &     j       cr     j      l 

The  asymptotic  variance  of  this  estimator  will  be 

(4.5)  V  =  El**].     ^  =  a0(z.)g0(x.)  -  0Q  +  E[a0<z)  |x]|  ~=x  yx.f  T^x.Hq.-g^x.)]. 

i 

4  4 

Theorem  4.2:     Suppose  that     i)     E[\q\    ]  <  m,     E[\q\     \x]f  (x)     and     fQ(x)     are  bounded, 

2 

and     E[\mJz)\    ]  <  oo;  ii)  Assumptions  K  and  H  are  satisfied  for     d  £  s;  Hi)  a.Jz) 

is  zero  if     x     is  not  in  a  compact  set,  and     fQ(x)     is  bounded  away  from  zero  on  that 

compact  set;     iv)     E[aJz)\x],     and     fQ(x)     are  continuous  a.e.  and  bounded  for     x 

2k  2  2s 

inside  the  compact  set  of  Hi);  v)  no-     /ln(n)     — >  oo     and     n<r       — »  0.     Then 

Vn(p-P0)-Y,i"1ilii/Vn  -^  0     and     VnCp  -  $Q)  -i>  N(0,  V).     If,  in  addition,  n<r3k  ->  oo,     then 

YnM-\l>.\\2/n  -U  0    and     v  -L+  v. 


This  result  gives  asymptotic  normality  for  a  trimmed  version  of  Stock's  (1989)  estimator, 
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as  well  as  being  a  general  result  on  the  asymptotic  normality  of  sample  moments  that  are 
random  linear  functions  of  kernel  regressions. 


5.        Useful  Lemmas 

Several  intermediate  results  of  a  familiar  type  are  useful  in  developing  asymptotic 

theory  for  the  m-estimator     j3     described  in  Section  2.     Uniform  convergence  results  are 

useful  for  showing  consistency  of     |3     and  of  the  Jacobian  term  in  the  expansion  of  (3.1). 

Asymptotic  normality  of     £     will  follow  from     v'ncr  m  (£_)  — >  N(0,V).     Also,     <r     v  -£-»  V 

is  very  important  for  consistent  estimation  of  the  asymptotic  variance.     In  addition, 

when     a  =  0,     corresponding  to  v'n-consistency  of     p,     it  can  be  shown  that  there  is     i//. 

such  that     v'nm  (8„)  =  Y.   ,\b./Vn  +  o  (1)     and     Y.   ,  110. -i//. II  /n  — ^->  0.     Primitive  conditions 
n    0        h=n  p  ^i=l     l     l 

for  each  of  these  results  are  given  in  this  Section.     Examples  of  how  these  results  can 


be  used  to  derive  results  for  particular  functionals  are  given  in  the  proofs  of  Theorems 
4.1  and  4.2,  and  in  the  proofs  of  results  in  Hausman  and  Newey  (1992),  Matzkin  and  Newey 
(1992),  and  Newey  and  Ruud  (1991). 

A  number  of  additional  regularity  conditions  are  used  in  the  analysis  to  follow. 

The  first  regularity  conditions  imposes  some  moment  assumptions.     For  a  matrix     B     let 

1/2 
IIBII   =  [tr(B'B)]       ,     where     tr(»)     denotes  the  trace  of  a  square  matrix. 

Assumption  Y:   For     p  £  4,     E[llyllP]  <  oo,     E[llyllP|x]f   (x)     is  bounded,     E[llm(z,/30,h0)ll2]  < 


This  condition,   like  Assumptions  K  and  H,   is  a  standard  type  of  condition.     The  fourth 
moment  condition  for     y     is  useful  for  obtaining  optimal  convergence  rates  for     h. 

For  the  asymptotic  theory,  it  useful  to  impose  smoothness  conditions  on  m(z,£,h) 
as  a  function  of  h,  in  terms  of  a  metric  on  the  set  of  possible  functions.  Here,  the 
metric  is  the  supremum  norm  on  the  function  and  its  derivatives,  a  Sobolev  norm.     The 
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supremum  norm  is  quite  strong,  but  uniform  convergence  rates  for  a  kernel  estimator  and 
its  derivatives  are  either  well  known  or  straightforward  to  derive  (see  Appendix  B),  and 
are  not  very  much  slower  than     L       convergence  rates  (there  is  only  an  additional  "log 
term"  in  the  uniform  rates).     Consequently,  conditions  for  remainder  terms  to  go  to  zero 
fast  enough  to  achieve  asymptotic  normality  will  not  be  much  stronger  with  the  supremum 
norm  than  they  will  be  with     L       norms.     Furthermore,  it  is  quite  easy  to  show  smoothness 
in  supremum  norm  for  many  functionals. 

To  define  the  norm,  for  a  matrix  of  functions  B(x)     let     3  B(x)/5x^     denote  any 
vector  consisting  of  all  distinct     j         order  partial  derivatives  of  all  elements  of 
B(x).     Also,  let     X     denote  a  set  that  is  contained  in  the  support  of     x,     and  for  any 
nonnegative  integer     j     let 

IIBII.smax,    .sup     rv.\\dlB{x)/dx'\\, 

where     IIBII .     is  taken  equal  to  infinity  if  the  derivatives  do  not  exist  for  some     x  e  X. 
This  is  a  Sobolev  supremum  norm  of  order     j. 

One  useful  type  of  result  is  uniform  convergence  in  probability,  as  in  the 
conclusion  of  the  following  result.     Let     m   (£)  =  E[m(z,/3,hn)]. 

Lemma  5.1:     Suppose  that     i)     m(z,fi,hn)     is  continuous  at  each     /3  e  S     with  probability 
one,  where     £     is  compact,  and     E[sup     CR\\m(z,p,hf))\\]  <  oo;     ii)  Assumptions  K,  H,  and  Y 
are  satisfied  with     d  £  A+l,  ln(n)/(ncr         )  — >  0     and     o-  — »  0,     and  there  is     biz)     and 
c  >  0     such  that     E[b(z)]  <  oo,     and  for  all     /3  e  £     and     llh-h   II     <  c, 
\\m(z,P,h)-m(z,p,h0)\\  *  b(z)(\\h-h0\\   f .     Then     E[m(z,&,h  )]     is  continuous  on     £     and 

(5.1)  sup        \\n~l£.i"1m(z,p,h)  -  E[m(z,p,hQ)]\\  -^  0. 

The  uniform  convergence  conclusion  of  equation  (5.1)  is  a  well  known  condition  for 

consistency  of  the  solution  to  equation  (2.3).     Also,  equation  (5.1)  is  useful  in  showing 

— 1    n  a 

consistency  of  an  estimator  that  maximizes  an  objective  function     n    £.     m(z.,|3,h), 
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where     m     is  a  scalar,  and  is  useful  for  showing  consistency  of  the  Jacobian  term 

~*1    yi  * 

n    £._  3m(z.,/3,h)/5(3,     by  letting  the     m     in  the  statement  of  the  Lemma  be  each  column  of 

of  the  derivative. 

Asymptotic  normality  of     v'nV  m  (0   )     is  essential  for  asymptotic  normality  of     |3. 
This  result  has  two  components,  which  are  a  linearization  around  the  true     h       and 
asymptotic  normality  of  the  linearization.     It  is  useful  to  state  these  two  components 
separately. 

Asymptotic  normality  of  the  linearization  will  follow  from  asymptotic  normality  of 
v'nV  m  (/3   )     when     m(z,j3  ,h)     is  a  linear  functional  that  does  not  depend  on     z,     say 
m(h)  =  m(z,Pn,h).     The  rate  of  convergence  (i.e.  the  magnitude  of     a)     will  depend  on  the 
nature  of     m(h).     Here  the  results  are  grouped  into  two  main  ones,  the  first  involving 
v^n-consistency.     For  the  moment,  assume  that     m(h)     is  a  scalar 

Lemma  5.2:     If     m(h)  =  Sv(x)'  h(x)dx     where     v(x)     is  zero  outside  a  compact  set, 

2  2 

continuous  almost  everywhere,  there  is     c  >  0     such  that     E[sup  II v(x+u) II   £711  y II    |x]]  < 

oo,     and     vnc-S  — >  0,     then  for     5.  =  v(x.)'y.,     Vn[m(h)  -  m(h  )]  =  £."/S.-Ef5.J>/vn  + 

o   (1). 
P 

Cases  where  convergence  is  slower  than     1/Vn     are  somewhat  more  complicated.     The 

following  assumption  is  useful  for  these  cases.     For  the  moment  let     £     be  a  nonnegative 

integer  and  let     9Jh(x)/dxJ     be  ordered  so  that     3  [y.K  (x-x.)]/3x    =  y®[3T<  (x-x.)/5x  ]. 

Ji   cr         i  J  <r         l 
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Assumption  5.1:     Suppose  that     k  =  k    +  k  ,     there  is  a  matrix  of  functions     w(t)     with 

k  k 

domain     R  z,     0  £  k„  <  k,     a  vector  of  functions     x  (t)     in     R  1,     such  that     i)     m(h) 

=  ,TGj(t)[dTi(x(t))/ax  ]dt     for     x(t)  =  (x  (t)'.t')';  ii)     w(t)     is  bounded  and  continuous 
almost  everywhere  and  zero  outside  a  compact  set     3",     and     x  (t)     is  continuously 
differentiable  with  bounded  partial  derivatives  on  a  convex,  compact  set     f     containing 
J     in  its  interior;     iii)     Z(x)  =  E[yy'  |x]     is  continuous  a.e.,  and  for     e  >  0     and     v(x) 
=  E[llyll4|x],     %sup||7}||i£;[<l  +  v(x1(t)+T},t)}f()(x1(t)+T),t)]dt  <  a,. 


The  key  condition  here  is  the  integral  representation  of     m(h).     The  dimension  of  the 

argument  being  integrated  and  the  order  of  the  derivative  lead  to  the  convergence  rate 

-  k  /2  +  £ 

for     m(h),     that  is     Vncr  l  .     Thus,  every  additional  dimension  of  integration 

increases  the  convergence  rate  by     v7?     while  every  additional  derivative  decreases  the 

rate  by  a  factor  of     l/o\     This  hypothesis  also  leads  to  a  specific  form  for  the 

£  £ 

asymptotic  variance  of     m(h),     which  for     X(u  ,t)  =  Sd  K(u+[5x  (t)/5t]v,v)/Su  dv     is 

(5.2)  V  =  Jcj(t)[E(x(t))®<fJ<(u1,t)^(u1,t),du1}a)(t),f0(x(t))dt. 


Lemma  5.3:     If  Assumptions  K,  H,  Y,  and  5.1  is  satisfied  with     d  a  £+s     and  for     a  =  k  /2 
+  I,     VR<rk/2  — =>  oo,     and     Vno-a+S  -h>  0     then     Vn(ra[m(h)-m(h0)]  -^  N(0,V). 


Asymptotic  normality  in  the  more  general  case  where     m(z,£   ,h)     depends  on     z     and 
is  nonlinear  in     h     can  be  reduced  to  the  previous  cases  by  a  linearization.     The 
following  assumption  is  useful  for  the  linearization.     Let     m(z,h)  =  m(z,/3n,h),     and 
again  assume  that  this  is  a  scalar. 


-  17  - 


Lemma  5.4:     Suppose  that  Assumptions  K,  H,  and  Y  are  satisfied,  X     is  compact,  there  is  a 

vector  of  functionals     D(z,h),     and  nonnegative  constants     a,     A.  s  A,     (t  =  1,  2),     c  > 

0     such  that     d  £  max{b+l,L,+s,b„+s}     and     i)  D(z,h)     is  linear  in     h     on     {h  :  llhll.   < 

12  A 

co>;     ii)  for  all     h     with     \\h-hQ\\     <  c,     \\m(z,h)-m(z,h0)-D(z,h-h0)\\  =£ 

biz) II h-hn II .   llh-h.IL   ;     iii)      \\D(z,h)\\  *  b(z) llhll.       and     E[b(zf]  <  «;     uO  /or     tjJ  = 
C   A  0   A  A  n 

12  1 

[ln(n)/(n<rk+2J)]1/2  +  crS,     /  ->  0,     Vn<rUE[b(z) JtA •  i,A2  ->  0     and     vncrk+Af  a  ->  co.     Then 

n  n      n 

/or     m(h)  =  fD(z,h)dF(z), 

Vn(rCCYnJm(z.,h)-m(z.,hn)]/n  =  VRa^ [m(h)-m(h J]  +  o  (1). 
*n=2         i  i    0  Op 

The  conditions  of  this  result  imply  Frechet  differentiability  at     h_     of     m(z,h)     as  a 

function  of     h,     in  the  Sobolev  norm     llhll         ,.     ...     The  remainder  bounds  are  formulated 

max{A  ,A  > 
r    2 

with  different  norms,  rather  than     A  =  A    =  A  ,     to  allow  weaker  conditions  for 

asymptotic  normality  in  some  cases. 

Asymptotic  normality  of     cr  2>     m(z.,h)/Vn     can  be  shown  by  combining  Lemma  5.4  with 

either  Lemma  5.2  or  5.3.     In  the  v'n-consistent  case  of  Lemma  5.2,  it  will  follow  from 

Lemmas  5.2  and  5.3  that     V.n,m(z.,h)/Vn  =  Y.nAm{z.,hn)  +  S.-E[d.]}/Vn  +  o  (1),     so  that 

^1=1       l  ^i=l         l    0  l         l  p 

asymptotic  normality,  with  asymptotic  variance     Var(m(z.,h   )+5.)     follows  by  the  central 

limit  theorem.     In  the  slower  than  v'n-consistent  case,  where     m(h)  =  JD(z,h)dF(z) 

satisfies  the  conditions  of  Lemma  5.3  and     a  >  0,     it  will  be  the  case  that 

o-°T.n,m(z.,h,J/'/n  -^  0,     so  that     o-0T.nim(z.,n)/vn'  — >  N(0,V). 
^1=1        l     0  ^1=1        l 


-  18  - 


Assumption  5.2:     i)     llm(z,p,h)-m(z,/30,h0)ll  £  b(z)[ll/H30lle  +  (llh-hgll^]     and     E[b(z)2]  < 
oo;     ii)  For     e  >  0     and     Hp-Pgll  <  e     and     llh-h   II     <  e,     there  is     D(z,h;/3,h)     that  is 
linear  on     llhll      <  oo     satisfying     |m(z,£,h)-m(z,/3,h)-D(z,h-h;£,h)  I   =  o(llh-h!l    )     as 
llh-hllA  — >  0     for  fixed     /3     and     h;     iii)   IID(z,h;/3,h)-D(z,h;/3   ,h   )ll   £  b(z) llhll     (II/3-/3   II   + 

llh-h.ll.    )     and     IID(z,h;/3_1,h,JII  s  b(z)llhll.       and     E[b(z)4]  <  co;     iv)     0  =  pn  +  0  (Sa  ), 
U  A_  U     U  A  Op     pn 

.  n        a-k-A     _.  .  ...  3k+2A +2A  -2a  ..   ,   ,  2k+2A  -2a 

Sa     — >  0,     o-  i«5_     — »  0,     a+s  >  k+A  ,     no-  12       /ln(n)  — >  co,     ncr  3         —>  co. 

Pn  Pn  1 


Lemma  5.5:     Suppose  that  Assumption  5.2  is  satisfied.     If     m(h)  =  SD(z,h;f$n,hn)dF(z) 
satisfies  the  conditions  of  Lemma  5.2  then,  for     5.  =  v(x.)y.,     T.   _IIS.-5.il  /n  — ^-»  0     and 
V  -^  V  =  Var(S.).     If     m(h)  =  fD(z,h;&  ,h  )dF(z)     satisfies  the  conditions  of  Lemma  5.4, 
<r2oiV  -^  V,     for     V     in  equation  (5.2). 
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Appendix  A:     Proofs  of  Theorems 


Throughout  the  appendix     C     will  denote  a  generic  constant  that  may  be  different  in 
different  uses  and     £.  =  £•_,■     Also,  CS,  M,  and  T  will  refer  to  the  Cauchy-Schwartz, 
Markov,  and  triangle  inequalities,  respectively,  and  DCT  to  the  dominated  convergence 
theorem.     Before  proving  the  results  in  the  body  of  the  paper  it  is  useful  to  state  and 
prove  some  intermediate  results. 

Proof  of  Theorem  4.1:     The  proof  proceeds  by  checking  the  conditions  of  Lemmas  5.3  -  5.5. 

Let     x  =  (x  ,x  ),     t(x)  =  x(x  ),     X     be  the  compact  set  of  hypothesis  iii),  and  llhll  = 

II nil      =  sup     xllh(x)ll.     Let     m(z,h)  =  x(x)h  (x)/h  (x),     D(z,h;h)  =  x(x)h  (x)_1[h  (x)  - 

{h_(x)/h  (x)}h  (x)],     and     D(z,h)  =  D(z,h;hn).     Choose     e     small  enough  that     h     (x)     is 

bounded  below  by     e     for  all     x  e  X.     Then  for     llh.-h._ll   <  c,      |m(z,/3,h)  -  m(z,j3,h)  - 

D(z,h-h;h)|   =   |  [h.(x)_1h  (x)  -  l]D(z,h-h;h)  |   s  Cllh-hll2     and      |D(z,h;h)|   *  Cllhll. 

Let     a  =  k,/2.     Then  for     i)     =  [ln(n)/(ner  )]         +  <r  ,     v'na-  ti     =  ln(n)<r       /Vn  + 
1  n  n 

_r,   ,   „l/2  a+s-k/2        ^-  2s+a  _     ,        .   ,   ,.   a-k.^-.         _        /-.  2s+a         _       .      ,   . 

2[ln(n)J      a-  +  vncr  — >  0     by     lnlnjcr       /vn  — >  0,     w  — >  0,     implying     cr 

goes  to  zero  faster  than  some  power  of     n,     and  by     a+s  >  k/2.     Also,     ln(n)cr       /Vn  — >  0 

k— a 
implies  that     v'na-         — >  oo,     so  that  the  rate  hypotheses  of  Lemma  5.4  are  satisfied. 

Thus,  the  conclusion  of  Lemma  5.4  holds,  with     m(h)  =  JD(z,h;h   )dF(z)  = 

J*T(x(t))f  (x(t))-1[h  (x(t))  -  g  (x(t))h  (x(t))]f0(t)dt,     for     t  =  x2     and     x(t)  =  (x-.t). 

Let     u(t)  =  x(x(t))f   (x(t))    f   (t)[-g   (x(t)),l].     This  function  is  bounded  and  continuous 

a.e.   and  zero  outside  a  compact  set  by  continuity  of     f   ,     f_,     and     g   ,     and  by  the 

assumption  about     x.     The  other  conditions  of  Assumption  5.1  are  also  satisfied  by 

hypothesis.     Furthermore,     v'na-         — >  0     and     v'no-  l       =  v'no-     — >  oo     by  hypothesis  and     a 

k-a  =  k     +  k/2.     Thus,  the  conclusion  of  Lemma  5.3  holds,  for     V     in  equation  (4.3). 

Then  by  the  triangle  inequality,   and     £     =  E[m(z,h   )] 

vn(ra(£-p0)  =  '/n(r0I.{m(z.,n)-E[m(z,h0)]>/n 
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=  i/n<ra^<m(z.,h0)-E[m(z,h0)]>/n  +  vn<r°*[m(h)-m(h0)]  +  o  (1)  -±>  N(0,V), 

because     v^o-  2^.{m(z.,h0)-E[m(z,h   )]}/n  — ^-»  0     by     a-  — »  0. 

To  finish  the  proof,  note  that  it  follows  from  the  above  arguments  and  by  hypothesis 

that  for     m(z,h,/3)  =  m(z,h)     and     D(z,h;|3,h)  =  D(z,h;h),     as  specified  above,  conditions 

i)  -  iii)  of  Assumption  5.2  are  satisfied,  with     A  =  A    =  A     =  A_  =  0.     Furthermore, 

condition  iv)  is  satisfied  by     5„     =  0,     no-  /ln(n)  — »  «,     and  the  fact  that  this  last 

£n 

condition  implies     s  >  3k/2  -  2a  =  3k  /2  +  k  /2  >  k     +  k  /2  =  k-a.     The  second  conclusion 
then  follows  by  the  conclusion  of  Lemma  5.5.     QED. 

Proof  of  Theorem  4.2:     Let     m(z,h)  =  a  (z)h  (xT  h  (x)     and     D(z,h;h)  = 

a   (z)h  (x)     [h  (x)  -  <h  (x)/h  (x)>h  (x)].     The  proof  that  the  conditions  of  Lemma  5.4  are 

satisfied  proceeds  exactly  as  in  the  proof  of  Theorem  4.1,  except  that     a  =  0     and  the 

function     b(z)     of  Lemma  5.4  is  taken  to  be     Ha  (z)ll.     Also,  here     m(h)  =  JTJ(z,h;h   )dF(z) 

=  E[a0(z)f0(x)_1{h2(x)  -  g^xlh^x)}]  =  E[E[a0(z)|x]f0(x)_1<h2(x)  -  g^xlh^x)}]  = 

JV(x)h(x)dx     for     i>(x)  =  E[aQ(z)  |x]  I  ~=xf0(x)_1f0(x)(-g0(x),l).     By  hypothesis,  the 

conditions  of  Lemma  5.3  are  satisfied  for  this     v(x),     so  that  by  the  conclusion  of  Lemma 

5.3,  for     5.  =  Wx.)y.  =  i//.-a  (z.)g  (x.)+|3  ,     one  obtains    v/n[m(h)-m(hn)]  = 

r.{5.-E[5.]}/v^i  +  o  (1).     The  first  conclusion  then  follows.     Also,  the  second  conclusion 
^i     l         l  p 

follows  from  Lemma  5.5  similarly  to  the  proof  of  Theorem  4.1.     QED. 

Proof  of  Lemma  5.1:     It  follows  by  standard  results  (e.g.  Tauchen,   1985)  that 

sup     jJIn    £.m(z.,p,h   )  -  E[m(z,£,h   )]ll  — ^>  0     and     E[m(z,£,h   )]     is  continuous  in     p. 

Also,  by  Theorem  B.2,      Hh-hJL   =  O  (ln(n)1/2(n«rk+2A)"1/2  +  <r)  =  o  (1).     Therefore, 

0  A  p  p 

sup    '     lln~  £.[m(z.,/3,h)  -  m(z.,3,h   )]ll   ^  n~  X!.b(z.)(llh-h   II    )e  -^  0     so  the  conclusion 
follows  by     T.     QED. 


Proof  of  Lemma  5.2:     By  the  Fubini  theorem,     E[m(h)]  =  m(E[h]).     Also,  by  standard 
results,     sup     „JIE[h](x)  -  h   (x)ll   =  0((r  )     for  any  compact  set     6\     Then  by     v(x)     zero 
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outside  a  compact  set     E,     v^n[E[m(h)]  -  m(h0)]  s  VnCsup     „IIE[h](x)  -  hQ(x)ll  =  0(v/nVs)  - 
0.     Let     5.   =  [JV(x)K  (x-x.)dx]y.  =  [JV(x.+cru)X(u)du]y.,     where  the  last  equality  follows 
by  a  change  of  variables     u  =  (x-xj/c,     so  that     m(h)  =  £.,S./n.     By     X(u)     having 


bounded  support,     v(x.+cru)X(u)  ^  b(x.)|X(u)|      for  all  small  enough     cr     and 

J*b(x.)|X(u)|du  <  oo.     Then  by  DCT,     sT  — >  5.     with  probability  one  as     cr  — >  0.     Also, 

|5j|    s  Cb(x)llyll,     so  by  DCT     E[  | eT 1 2]  ->  0     for     cT  =  sT-S..     Then  by  M,     E.teT-EleTlWn 

-£»  0,     so     vn[m(h)-m(h0)]  =  vfi{m(h)-E[m(h)]>  +  o(l)  =  X.<5.-E[5.]>/^n  +  2\<e,f-E[eT]>/vn 

+  o(l)  =  YA8.-E[8.]}/Vn  +  o  (1).     QED. 
*-i     l         l  p 

Proof  of  Lemma  5.3:  Note  that  E[m(h)]  =  m(E[h]),  so  by  w(t)  bounded  and  zero  outside 
3",  and  by  x(t)  bounded  on  3",  it  follows  that  v^T(ra{E[m(h)]  -  m(h  )>  £  VhVallE[h]-h  II. 
=  0(vno-a+S)  — >  0.     Therefore,  it  suffices  to  show  that     v/ncra<m(h)-E[m(h)]>  -^->  N(0,V). 

Let     X£(u)     denote     dlK(u)/8u      and     p  (x)  =  cr~k~£Jw(t)[I®:K£((x(t)-x)/<r)]dt 

-k  -I  I 

=  cr     l    Jw(x  +o-v)[I®X  ((x  (x  +crv)-x  )/cr,v)]dv,     where     I     is  an  identity  matrix  with  the 

same  dimension  as     y     and  the  last  equality  follows  by  a  the  change  of  variables     v  = 

(t-x_)/o\     Then     m(h)  =  Y.p  (x.)y./n.     Thus,  to  show     vno-a{m(h)-E[m(h)]>  -U  N(O.V)     it 
z  1   0*     1     1 

suffices,  by  the  Liapunov  central  limit  theorem,  to  show  that     cr     Var(p  (x.)y.)  — >  V     and 

<r4aE[llp  (x.)y.ll4]/n  — »  0.     By  i.i.d.   data  and     Vn  -h>  oo,     cra|IE[p  (x.)y.]-m(h0)ll   = 

cr   IIE[m(h)]  -  m(h_)ll  — >  0,     and  hence     cr   IIE[p  (x.)y.]ll  — >  0.     Therefore,  to  show 
0  cr     l    l 

cr     Var(p  (x.)y.)  — >  V     it  suffices  to  show  that     cr     t[p  (x.)y.y'.p  (x.)']  — >  V.     By     X(u) 
cr     l     l  cr     l     l   l   cr     l 

having  bounded  support,     X(u  ,v)     is  zero  for  all     v     outside  a  bounded  set     V.     Let     3" 

be  a  compact,  convex  set  containing     J     in  its  interior.     Then,  for  small  enough     cr,     if 

x„  £  J     then     x„+crv  g  3"     for  all     v  e  V,     so     p  (x)     is  zero  for     x„  £  J.     For     x„  e  3" 
2  2  cr  2  2 

and     x  +o-v  e  3",     continuous  differentiability  of     x  (t)     and  a  mean  value  expansion  give 

[x  (x  +crv)-x  (x  )]/cr  =  [9x  (x  +crv)/3t]v,     which  is  bounded  over  for     v  e  V     and  converges 

k  +1 
to     J(x„)v     as     cr  — >  0.     Therefore,     cr  l    p   (x,(x„)-cru,x„)  = 
Z  cr     1     Z  Z 

Jcj(x  +crv)[I®X  ([x  (x„+crv)-x  (x   )]/cr  +  u,v)]dv     is  zero  for  all     u     outside  a  compact  set, 
is  bounded,   and  converges  to     Tcj(x  )[I®X  (J(x   )v  +  u,v)]dv     by  the  dominated  convergence 
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theorem,  for     J(t)  =  3x  (t)/3t.     Then  by  the  change  of  variables     u  =  [x  (x„)-x  ]/<r     and 
t  =  x2, 


(A.6)  <rki+2*E[p  (x.)y.y'.p  (x.)'] 


C      1110"      1 

?k  +92 
=  o-i      Jp  (x,(tHni,t)E(x1(t)-<ru,t)p  (x.(t)-o-u,t)'f,,(x1(t)-o-u,t)dtdu  — >  V. 
o"    1  1  <r    1  U     1 

Also,     o-4aE[llp   (x.)y.ll4]/n  *  o"4aE[llp   (x.)ll4lly.ll4]/n     ^  o-4aE[llp   (x.)ll4<l+v(x.)>]  = 

0"      1      1  0"      1  1  (T      1  1 

Ik  +4^  4  k 

O     l      JV«P  (x,(t)-«ru,t)ll   {l+v(x,(t)-oni,t)>f-(x1(t)-(ru,t)dtdu/n  =£  C/(no-  l)  -*  0.     The 
j      o*    l  l  u    l 

conclusion  then  follows  by  the  Liapunov  central  limit  theorem.     QED. 

Proof  of  Lemma  5.4:     By     d  £  A  +  1     and  Lemma  B.3,      Hh-lrJI     -^-»  0.     Then  by  hypothesis 


iv)     llcAi  1/2Y.{m(z.,h)-m(z.,hJ-D(z.,fi-hJ>ll  s  <raVn[YMz.)/n]\lh-hn\\ .   Ilfi-h.ll.     = 
^11  1O1O  ^11  OAOA 

l  2 

0  (v'SraE[b(z)hA«TjA2)  =  °  ID-     By  linearity  of     D(z,h),     r.D(z.,h)/n  =  T.  .D.  ./n2     for 
p  n      'n  p  J  n       l  ^ij   ij 


n  ~  n 


D. .  =  D(z.,y.K  («-x.))     and    E-=EM!\-     Let     D.  =  E[D..|z.]     and     D.     =E[D..|z.] 

ij  i     j    c  J  ^ij         ^l=l^J=l  «1  Jl      1  1'  IJ      1 

for     j  *  i.     Note  that     E[D2.]  £  E[b(z)2llyll2]cr"2k"2Ai     and     E[D2.]  s 

ii  ij 

2  2    ^2k-2A 

E[b(z)   ]E[llyil   ]cr         ~  l.     Then  by  a  V-statistic  projection  on  the  basic  observations 

(e.g.   Serf  ling,  Lemma  5.2.2b),  vW  |  n-2^.  .<D.  -D.  -D   .+E[D.   ]}|   = 

**ij     ij      i*      «i         i» 

0  (vncra<(E[D2.])1/2+(E[D2.])1/2>/n)  =  0  (<ra~k~\/Vn)  -iL>  0.     By  Lemma  B.4,     D.     = 
p  n  ij  p  J  i« 

D(z.,h).     By     <r  — >  0,      Hh-hJL     -H>  0.     Then     E[D(z.,h-hJ2]  £  E[b(z)2](  llh-h^ll .  )2  — »  0. 
l  0  A  l         0  0  A 

l  l 

Thus,  by  Chebyshev's  inequality,     cr'TAD.  -E[D.   ]-D(z.,h0)+E[D(z.,h,,)]>/v/n  -?->  0.     Then 

^i     l  •  i  •  lO  lO 

the  conclusion  follows  by  T.     QED. 


Proof  of  Lemma  5.5:     It  follows  by  a  standard  argument,   similar  to  the  proof  of  Lemma 

5.1,  that    E.nJlm(z.,0,n)-m(z.,0_,h_)ll2/n  -5-»  0.     Let     D. .  =  D(z.,y  .K  (— x.);£,h)     and 
^i=l  l  i     0    0  ij  l    j   a-         j 

D..  =  D(z.,y.K  (•-x.);0n,hn),     5.  =  J.n,D../n,     and     5.  =  E[D..|z.]  = 
ij  i    J   o-         J      0'   0  l       ^j=l   ji  l  ji'    l 

JT>(z,,y.K  (•-x.);|3_,h.JdF(z).     By  Lemma  B.5,     5.  =  T.  ,D../n.     Also,  by  Assumption  5.2, 
J   o"         J      0    0  J  i       ^j=l   ji 

IID..-D..II   *  b(z.)lly.K  («-x.)IL  (110-0- II   +   llfi-h.ll.   )  *  b(z.)lly  .llo-~k-AiO  (50  +tjA2).     Then  by 
ij     ij  l       l   <r         l    A  0  0  A  l      j  p    0n     n 

CS, 
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^.^HS.-a.ll2/!!  *  Co-^.IID.j-D.yi2/!!2  *  Qr2a[5:.b(Z.)2/n](j:.lly.ll2/n) 

(r"2k_2Ai(ll£-0_ll   +   llh-h_ll.   )2  =  OJ[a-CC~k~\(5Q  +7)A2)]2)  =  o  (1). 
O  O  A  p  pn     n  p 


By  the  data  i.i.d., 


E[<r2a5\H5.-S.II2/n]  £  cr2ceE[llg  -«  n2]  *  Gr2a(E[lln  ^     (Du-«  )B2]  +  n  1E[IIDnll2] 


+  n  ^[IISjII2]  £  Qr2an  1(E[IID     II2]  +  E[IID    II2]) 


„  _  2a  -1  -2k-2A 

s  Qr     n    cr  3  — >  0, 


so  by  M,     er     2l.H6.-5.il  /n  -£-»  0.     Under  the  conditions  of  lemma  5.2,  it  was  shown  in  the 
J  *i     1     1 

proof  of  Lemma  5.2  that     E[II5.-5.II2]  — ■>  0,     so  that     y.HS.-S.II2/n  -^>  0     follows  by     T. 

1     1  ^1     1     1  J 

Then     V  — — »  V     follows  by  T  and  the  law  of  large  numbers.     Under  the  conditions  of  Lemma 


5.3  note  that     5.  =  p   (x.)y.     for     p   (x)     defined  in  the  proof  of  Lemma  5.3.     As  shown  in 
1         c    1  Ji  rxr  e 

that  proof,     craE[5.]  — >  0,     cr2aE[5.6'.  ]  — >  V,     and     n-1o-4aE[H6.ll4]  — >  0.     Therefore,  by  M, 
1  11  1 

(r     n    V.(5.-y  .5  ./n)(5.-y  .5  ./n)'   — 5->  V,     so  the  conclusion  follows  by  T.     QED. 
u\    1  ^j  j  1  ^j   j 
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Appendix  B:  Technical  Details 


This  appendix  derives  rates  for  uniform  convergence  in  probability  in  Sobolev  norms  for 
derivatives  of  kernel  estimators.     Recall  from  the  text  that  for  a  closed  set     X,      llhll .  = 


sup^.sup     „IIS  ri(x)/Sx  II. 


Lemma  B.l:     Suppose  that     E[\\y\\P]  <  <x>     for     p  >  2,     Et\\y\\P  \x]fQ(x)     is  bounded,     1     is 
compact,  Assumption  K  is  satisfied  for     A  £  j,     and     <r  =  <r(n)     such  that     <r(n)     is  bounded 
and     n  <r(n)  /ln(n)  — >  oo.     Then 

(B.l)  \\h-E[h]\\  .  =  0  (ln(n)1/2(no-k+2Jf1/2). 

J         P 

Proof:     It  suffices  to  prove  the  result  for     y     a  scalar.     For  each     £  ^  j,     by     X(u) 

having  bounded  support  the  order  of  differentiation  and  integration  can  be  interchanged  to 

obtain     E[S  h(x)/dx  ]  =  dTE[h](x)/axf     Next,   let     H(x)     denote  and     £th     order  partial 

derivative  of     h(x),     and     k(x)     the  corresponding  derivative  of     K(x),     so  that     H(x)  = 

n"1cr"k"£X."1y.fc((x-x.)/o-)>     and     a£E[h](x)/dx£  =  E[H(x)],     where  the     n     argument  of     <r(n) 

is  suppressed  for  notational  convenience.     Also,  for  a  constant  P,     let  y.     =  y.,    I  y.  I   — 

Pn1/p;     y.     =  Pn1/p,     y.  >  Pn1/p;     y.     =  -Pn1/p,     y.  <  Pn1/p.     Let     H(x)  = 
in  ■'l  Jm  J l 

n    o-        L-=1y-  fc((x-x.)/o-).     Note  that  by  Bonferonni's  inequality, 

(B.2)  Prob(H(x)  *  H(x)     for  some     x)  £  Prob(y.     *  y.     for  some     i  s  n) 

in         l 

£  nProb(y.     *  y.)  £  nProb(|y.|    >  Pn1/p)  £  E[  |y.  |P]/P1/p. 

Let     8  =  [ln(n)/(ncrk+2£)]1/2.     For     c(x)  =  E[  |y.  |P|x.=x]     and     P     fixed,   by     c(x)fQ(x) 
bounded  and     p  >  2, 
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(B.3)  5  1|E[fi(x)]  -  E[H(x)]|   s  5_1<r  k  *E[1(  ly.  l>Pn1/p)  |y.  |  |«(x-x.)/<r)  I  ] 

*  C5"1<r"k"£n(1/p)_1E[  |  y.  | P  |  x.]  |  fc((x-x.)/<r)  |  ] 

__-l  -I  (1/pH...,   ,,    ,  ,-  ;  ..         .,  k/2  (l/p)-l/2.,   ,   ,1/2,         ,,, 

=  C5    cr    n      *      S I  «(v)  |  c(x-<rv)f  (x-crv)dv  =  0(cr       n      K  /ln(n)       )  =  o(l). 

Next,  by    k(x)     Lipschitz,  sup       -  3|fi(x)-fi(x)  I   *  Cn(1/p)""3<r~k~£_1.     Also,  by     X 

3k  -3 

compact,  it  can  be  covered  by  less  than     Cn         open  balls  of  radius     n     .     Let     x.       denote 

je 

-3k 
the  centers  of  these  open  balls,     (j  =  1 J(e)),     J(e)  ^  Cn       .     Then  for     x.  (x) 

equal  to  the  center  of  an  open  ball  containing     x,     by      |E[H(x)]-E[H(x)]  |    ^  E[  |H(x)-H(x)  |  ] 

it  follows  that 

(B.4)  supv|H(x)-E[H(x)]|    s  sup(y.|H(x)-E[H(x)MH(x.  (x))-E[H(x.  (x))]}| 

+  suP(r|<H(x.  (x))-E[H(x.  (x))]>|   *  Cn(1/p)"3(r"k"£_1  +  sup.|H(x.   )-E[H(x.  )]|. 
•*■  Je  je  J         jc  je 

2  3k  k+2 

Note  that  ln(n)     <  Cn     and,  by     cr(n)     bounded,     cr       <  Ccr       .     Then  for  the  constant     C     in 

k 
eq.   (B.4),  it  follows  by     p  a  2     and     no-     — >  oo     that  for  all     M,     n     big  enough,  MS  - 

_  (l/p)-3  -k-£-l       .....       r.,,x,2,   ,   ,  5-(2/p)  k+2,1/2,  .    .._,,       ,-,,.,2.   ,   ,     k,3/2,  .    ...  ._ 
Cn      r      o-  =  MS(1  -  C/[M  ln(n)n  r  tr       J       )  >  M5(l  -  C/[M  ln(n)ncr  J        )  >  MS/2. 

Also,  note  that     n      cr  5  =  [n  cr  /ln(n)]  — >  0.     As  usual  for  kernel  estimators, 

-7V-73       7  7  -7Y-7H        7  7  -Y-7I  7        7 

a-  ^      YXy.   k{(x-x.)/<rf]  *  «r  Ely  k{(x-x.)/<rf]  s  <r      ^JVt(u)*E[yf|x.=x-<ru]f_(x-<ru)du  £ 

in  l  J  i  i  J i      i  0 

-k-2£  2  -k-£ 

Co-  by     E[y    |x]f   (x)     bounded.       Then  by  eq.   (B.4),     y.  cr        &((x-x.)/cr)     bounded  by 

Cn      tr        ,     and  Bernstein's  inequality,  for     M     and     n     large  enough, 

(B.5)  Prob(supT|H(x)-E[H(x)]|    >  MS)  *  Prob(sup.|H(x.   )-E[H(x.   )]|    >  MS/2) 

*  ^(^Prob(|H(x.   )-E[H(x.   )]|   >  MS/2)   ] 

^  2yJ(^)exp(-n2S2/[2nVar(cr"k_£y.  k({x.  -x.)/<r)  +  Cn1+(1/p)cr"k"£S]) 
^J=l  ■'in        jc     l 

.  _  3k       ,     _2.„r  -k-2£        1/p  -k-£_n  ^  _  k       ,  _     k+2£_2.fl  A     1/p  «-„ 
£  Cn     exp(-n5  /C[cr  +  n      cr        5])  £  Cn  exp(-Cncr        5  /(l  +  n      cr  5)) 
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31c  2  2 

£  Cn     exp(-CM  •ln(n))  *  Cexp(-[CM  -3k]ln(n)). 


Since  these  inequalities  hold  for  any     M,     n     large  enough,  it  follows  that 

sup„|H(x)-E[H(x)]|    =  0  (5).     Then  by  eq.   (B.3)  and  the  triangle  inequality, 

supx|H(x)-E[ft(x)]|   =  0  (5).     Consider  any     e  >  0.     Choose     P     so  that     E[ |y. |p]/P1/p  <  e/2, 

so  that  by  eq.   (B.2),     Prob(5~  sup„|H(x)-H(x)  I    >  M/2)  <  e/2     for  all     n.     For  this  fixed     P, 

by     sup„|H(x)-E[H(x)]|   =  0  (5)     there  exists     M     such  that     ProWS^sup^  |H(x)-E[H(x)]  I    > 

M/2)  <  e/2     for  all     n.     Then  by  the  triangle  inequality,     Prob(5~  sup^lHCxJ-ElHtx)]!   >  M)  < 

ProbCS^sup^lHtxJ-HCxH   >  M/2)  +  Prob(5-1suP;r  |H(x)-E[H(x)]  |    >  M/2)  <  e.     Therefore,  by  eq. 

(B.2)  and  the  triangle  inequality,     supT  |H(x)-H(x)  |    =  0  (5).     The  conclusion  then  follows  by 

J.  p 

applying  this  conclusion  to  each  derivative  of  up  to  order     j     and  by     cr     bounded.     QED. 

Lemma  B.2:     If  Assumptions  K,  H,  and  Y  are  satisfied  for     d  £  j+s     then     \\E[h]-h  II  .  = 
0(c-m). 

Proof:     Note  that     E[h](x)  =  E[y.K  (x-x.)]  =  J"h(t)[X((x-t)/<r)/crk]dt  =  TX(u)h(x+u(r)du,     so 

10"  1 

that  by     X(u)     having  finite  support,     9JE[h](x)/dxJ  =  .f;K(u)dJh(x+u<r)/axJdu.     Also,  by 
J~K(u)du  =  1     it  follows  that     h  (x)  =  TK(u)h  (x)du.     Then  by  a  Taylor  expansion  in     cr 
around    a-  =  0,     for  constant  matrices     C„,     U  =  1 j), 

(B.6)  IISJh/5xJ  -  aJ'h0(x)/axJll  =  ll^jc/c^WuK®  f1u>®{aJ+£h(x)/axJ+£)du 

+  C   o-mXX(u){®  m]u>®{aJ+mh(x+S:u)/axJ+m}dull 
m  r  =  l 

*  Ccrm[T|K(u)|llullmdu]llsup   liaJ+mh(x)/axJ+mll   *  Ccrm.         QED. 


Lemma  B.3:     If  the  hypotheses  of  Lemmas  B.l  and  B.2  are  satisfied  and  Assumption  H  is 
satisfied  with     d  s  j+s     then 

(B.7)  Hh-hJI  .  =  0  (ln(n)1/2(n<rk+2J)~1/2  +  <rS). 

0  J         P 
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Proof:     Follows  by  Lemmas  B.l  and  B.2  and  the  triangle  inequality.     QED. 


Lemma  B.4:     If  Assumption.  K  is  satisfied,     m(h)     is  linear,      \m(h)\   £  Cllhll   ,     then 
E[m(h)]  =  m(E[h]). 


Proof:     By     X(u)     having  finite  support  and     X     compact  there  is  a  compact  set     &     such 
that     UK  (•-xJIL    =  0,     and  hence     m(g(x)K  («-x))  =  0,     for  all     x  i  6\     Hence,  by 
linearity  of     m(h),     E[m(h)]  =  .T„m(g(x)K  (•-x))f_.(x)dx     and     m(E[h])  = 

(*>  (F  \J 

m(J\0g(x)K  (•-x)fri(x)dx).     Let     F  (x)     be  a  sequence  of  measures  with  finite  support,  that 

to  0*  U  J 

converge  in  distribution  to  the  distribution  of     x     on     £     (e.g.  the  empirical  measure 

from  a  sequence  of  i.i.d.   draws)  as     J  — >  oo.     Then,  since     m(g(x)K  («-x))     is  continuous 

and  bounded  on     W,     it  follows  that     JV^mtgCxJK  (•-x))F.(dx)  — >  E[m(h)].     Also,  since  each 

to  o*  J 

derivative  of     g(x)K  (x-x)f   (x)     with  respect  to     x     of  up  to  order     A     is  bounded  and 

continuous  on     £,     it  follows  that     ll.f„g(x)K  (•-x)FI(dx)-E[h]ll .   — >  0,     and  hence 

to  (T  J  A 

mtJUgMK  (♦-x)F.(dx))  — >  m(E[h]).     Furthermore,  by     F.     having  finite  support, 

to  CT  J  J 

m(J'v=,g(x)K  (•-x)F.(dx))  =  Tm(g(x)K  (»-x))F¥(dx).     Then     T     gives  the  conclusion. 

to  0"  J  CT  J 

QED. 


Lemma  B.5:     If  Assumption  K  is  satisfied  and  for  given     h     with     Hhll.   <  oo     there  is 
linear     D(h)     with      \m(h)-m(h)-D(h-h)\   =  o( II h-h II   )     as     II h-h II     — >  0,     then 
dm(h  +  C,yK  (>-x))/dC,\ r  .  =  D(yK  (—x)). 

Proof:     Let     h„  =  h  +  <yK  (--x),     so  that     llh  -fill,   s  C"yK  ('-x)ll  *  CC,.     Then 
|m(h»)-m(fi)-CD(yK  (»-x))|/£  =    |m(h  J-m(fi)-D(h „-h)  |/<  =  o(llh  -fill  ./|Cl )  =  o(l).     QED. 
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